Q13. What is model distillation, and how is it applied to LLMs?
Ans - Model distillation is a technique where a smaller, simpler
model (student) is trained to replicate the behavior of a larger, more
complex model (teacher). In the context of LLMs, the student model
learns from the teacher’s soft predictions rather than hard labels,
capturing nuanced knowledge. This approach reduces computational
requirements and memory usage while maintaining similar
performance, making it ideal for deploying LLMs on resource-
constrained devices.
Bhavishya Pandit
Q14. How do LLMs handle out-of-vocabulary (OOV) words?
Ans - Out-of-vocabulary words refer to words that the model did not
encounter during training. LLMs address this issue through subword
tokenization techniques like Byte-Pair Encoding (BPE) and
WordPiece. These methods break down OOV words into smaller,
known subword units. For example, the word “unhappiness” might be
tokenized as “un,” “happi,” and “ness.” This allows the model to
understand and generate words it has never seen before by
leveraging these subword components.